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Compressive auto-indexing in femtosecond nanocrystallography 
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Ultrafast nanocrystallography has the potential to revolutionize biology by enabling structural 
elucidation of proteins for which it is possible to grow crystals with 10 or fewer unit cells on the side. 
The success of nanocrystallography depends on robust orientation-determination procedures that 
allow us to average diffraction data from multiple nanocrystals to produce a three dimensional (3D) 
diffraction data volume with a high signal-to-noise ratio. Such a 3D diffraction volume can then be 
phased using standard crystallographic techniques. "Indexing" algorithms used in crystallography 
enable orientation determination of diffraction data from a single crystal when a relatively large 
number of reflections are recorded. Here we show that it is possible to obtain the exact lattice 
geometry from a smaller number of measurements than standard approaches using a basis pursuit 
solver. 

Keywords: crystallography; indexing; compressive sensing 



I. INTRODUCTION 

X-ray crystallography is currently the leading method 
for atomic resolution imaging of macromolecules. Third 
generation synchrotron sources permit successful struc- 
ture solution from crystals 5 microns in size or greatei^. 
Obtaining sufficiently large crystals is currently an im- 
portant stumbling block in structure determination. 

The Linac Coherent Light Source (LCLS) recently be- 
gan operation 2 at the SLAC National Accelerator Lab- 
oratory in Palo Alto, California, using energetic elec- 
trons from a linear accelerator to produce coherent x-rays 
with an instrument called a free electron laser (FEL). 
Free Electron Laser sources produce pulses of light that 
are over 9 orders of magnitude brighter than current 
third generation synchrotron sources 3 . Several other x- 
ray laser sources of this type are being built or planned 
worldwide^. 



The high number of photons incident on a specimen 
are expected to produce measurable diffraction patterns 
from nanocrystals perhaps even down to a single period 
(single molecule]P, enabling high resolution structure elu- 
cidation of systems which can only be crystallized into 
very small crystals that are not suitable for conventional 
crystallography. Even for larger crystals, the short pulses 
can circumvent the radiation damage proble which 
limits the resolution of many sensitive crystals. 

In such an experiment, two-dimensional (2D) diffrac- 
tion images of randomly oriented nanocrystal of the same 
type can be captured within an exposure time of a few 
femtoseconds. These images can then be used to deduce 
the 3D structure of the molecule. To see the structure in 
3-D, one has to merge the data from all these individual 
nanocrystals, whose orientations are not known. 

Femtosecond nanocrystallography brings new chal- 
lenges to data processing 9 '. One problem is that the ori- 
entation of each diffraction image obtained is unknown. 
Another problem is that a single snapshot of the crystal 
diffraction pattern may contain very few reflections. In 



traditional crystallography, a small angular range of inte- 
gration ensures that many Bragg reflections are recorded 
while ensuring that overlaps are minimized. This is not 
possible with ultrafast x-ray pulses. The relentless im- 
provements of these light sources (beam energy, beam 
divergence and wavelength) will further exacerbate the 
problem. 

These new difficulties make indexing such patterns a 
hard problem for existing crystallographic software. 



II. STRUCTURE DETERMINATION FROM 
CRYSTAL DIFFRACTION 




Detector 
Plane 



FIG. 1: Scattering geometry for coherent x-ray diffraction 
imaging. 

The diffracted photon flux / (photons/pulse/ pixel) 
produced by a crystallite is given by 



J(q) = J r 2 e PAn\F(R^)f 



(1) 



$ (#*q - {.hh + kk + Zi)) 



h,k,l 



where -F(q) is the continuous scattering from one unit 
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cell (molecule), is the 3D rotation matrix of the un- 
known object orientation, q a 3D vector that relates the 
Bragg "reflection" on a two dimensional detector to a 
point in a 3D Fourier space, J Q is the incident photon 
flux density (photons/pulse/area), r 2 e is the electron cross 
section, P is a polarization factor, AQ is the solid angle 
subtended by a detector pixel at the sample, the (ft, fc, Z) 
integer values are called Miller indices, (h, k, 1) identify 
the Bravais lattice characteristic of the crystal periodic 
structure, and S is the shape transform of the crystallite 
finite dimensions. For large crystals, S is simply a Dirac 
^-function. For a crystal made of a few unit cells, S is 
broadened and may introduce an error in the location of 
the reflection. 

Each set of pixel coordinates on a detector placed at 
a distance z& from the sample is Pij, ZD ~ (Pi* + PjY + 
zdz) and corresponds to a value of q in the 3D reciprocal 
space according to the geometric description of elastic 
scattering shown in Figure [I] 

In this figure, ki n and k out are the incident and scat- 
tered wave vectors that satisfy |ki n | = |k out | = k = 1/A, 
where A is the wavelength of the x-ray. The direction 
of ki n and k out are the same as the direction of the 
incident beam k in = fc(0,0, 1) and the outgoing beam 
k ut = kpij jZD . The coordinates of a lattice point q^j 
corresponding to Pij jZD satisfy 

= k ut ki n , 

= T f#^4-(0,0,l)V (2) 

The end point of the vector q lies on a 2D surface called 
the Ewald sphere. This spherical surface of radius k in- 
tersects the origin (q = when p = (0,0, zd)), and is 
centered at (0, 0, —k) (while the origin of Pij jZD is at the 
sample). 

In traditional crystallography, the most common data 
collection method is the rotation method in which the 
diffraction image is collected while rotating the sample, 
i.e. varying R^ in Eq. [I] 

A small angular range of integration ensures that all 
Bragg reflections are recorded while overlaps are min- 
imized. The strength (measured intensity) and orienta- 
tion of each Bragg reflection is estimated from the diffrac- 
tion geometry (including source divergence, bandwidth, 
pixel size and angular average). 

In x-ray crystallography, the term indexing refers to 
the task of assigning the measured Bragg peaks to the 
discrete locations (ft, /c, I) of a periodic lattice. Auto- 
indexing uses the position of these peaks to deduce the 
shape (h, k, 1) and orientation (R^) of the lattice, and to 
identify the lattice coordinates (ft, fc, I) of each measured 
peak. 

It is accomplished in several steps. 

• For the purpose of autoindexing, one can simply as- 
sign the value of 1 to /(qij) for every peak above 
a noise threshold. As a result, one obtains a 3D 



map 6(q) in the reciprocal space that contains the 
values of either 1 or 0. Note that b(q) is only af- 
fected by the content of a unit cell when |-F(q)| is so 
small that the reflection is not detected. Assuming 
5(q) ~ (S(q), Eq. ([!]) becomes: 

6(q) ~ 6 ~ ( ftt + + ")) ' ( 3 ) 

h,k,l 

• Some type of computational analysis is performed 
on the 3D map to ascertain the orientation and 
the unit cell parameters of the crystal (R^, h, k, 1). 
The analysis typically proceeds by making use of 
Fourier transform and peak searches. An efficient 
algorithm that uses many ID Fourier transform was 
proposed ir l 1Q | 11 i It is used in many existing autoin- 
dexing software packages such as MOSFLM^. We 
will provide details of these algorithms in the next 
section, as this problem will be the focus of our 
paper. 

• Once the lattice vectors and orientation are deter- 
mined, the lattice coordinates that overlap with the 
Ewald sphere will provide the index of a reflection. 
Multiple solutions due to mirror symmetries of the 
lattice are generated. These solutions can be dis- 
tinguished using the measured intensities. 

Once the orientation and the unit cell parameters asso- 
ciated with a crystal has been determined, one may then 
proceed to estimate the intensities of the crystal, from the 
diffraction geometry (including source divergence, band- 
width, pixel size and angular average). Finally, a phase 
retrieval algorithm is used to recover the phase of the 
Fourier transform and subsequently the 3D density map 
of the crystal. 

For the purpose of this paper, we will not discuss the 
issues of structure factor determination or the phase re- 
trieval problem. Instead, we will focus on the second 
item of the autoindexing problem, how to determine the 
lattice parameters and orientation. 

Multiple solutions due to symmetries of the lattice (but 
not of the crystal) will still have to be sorted out using 
measured intensities. In this paper we do not address 
this problem, which presents another challenge when at- 
tempting to merge many thousand of low-signal snap- 
shots. 



III. REAL SPACE AUTOINDEXING 

Most autoindexing algorithms search for peaks in real 
space, by applying some form of 3D Fourier transform of 
the binary reciprocal space map b(q). 

If the region of q— space that was measured is large, 
its 3D FT will provide the real space lattice. A simple 
numerical thresholding may reveal the positions of the 
3D lattice points in real space. They can subsequently 
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be used to determine the unit cell parameters, crystal 
orientation and type. 

The binary 3D mask M(q) that defines the region of 
q— space that was measured (M(q)=l if q was measured, 
=0 otherwise) can be viewed as a 3-dimensional optical 
transfer function. The 3D Fourier transform of the binary 
mask forms a point spread function which convolves the 
real space lattice. If the PSF is larger than the lattice 
spacing, these cannot be resolved. 

As we will show in section |VI[ when the number of 
measured Bragg peaks is less than 10, the real space lat- 
tice points cannot be easily distinguished from the rest of 
the sampled voxels based on the intensity of the inverse 
3D Fourier transform. 

The use of a 3D Fourier transform around the ori- 
gin for indexing a diffraction pattern was suggested over 
two decades agcpSl. A similar approach appears to have 
been incorporated in the program DENZO, which has 
been distributed as part of the diffraction-image pro- 
cessing suite HKlP. A 3D FFT has been used to in- 
dex diffraction images by calculating a Patterson func- 
tion from a set of reflections which have all been as- 
signed unit intensity^. Efficient software implementa- 
tions avoid the use of a full 3D Fourier transform, by 
using the Fourier projection-slice theorem^, calculating 
ID sections of the 3D FTs by a series of projections 
and 1-dimensional FFTs^"^. Indexing software such as 
MOSFLIVPI LABELITCSI ut ili ze this approach. 

The complexity of the projection-FFT approach is 
mn log n, where m is the number of direction vectors that 
must be generated, and n is the number of samples along 
the projected ID intensity profile, which is proportional 
to TV 1 / 3 , where N is the total number of sampled voxels 
contained in the crystal. A typical value of m is between 
5,000 and 20,000. Clearly, this method will not work well 
if the number of Bragg points on a diffraction pattern is 
small. 

Although the argument used in 11 for abandoning the 
full 3D FFT is the high cost for performing 3D FFTs of 
large crystals, this is no longer a serious issue due to the 
rapid growth in the processing speed and memory capac- 
ity of modern multi-core microprocessors. At the time of 
writing, a 512 3 3D FFT takes about 0.15 seconds on a 
GPU processor, while a 2048 3 takes about 7 minutes on a 
machine with sufficient memory. Furthermore, there are 
now algorithms that we may use to take full advantage of 
the sparsity of the 3D reciprocal space map 19 , i.e., there 
are a few non-zeros in the 3D map constructed from the 
Bragg reflections, and reduce the complexity of the 3D 
FFT calculation from the standard 0(N log TV) to that of 
0(N 2 / 3 log TV), where N is the total number of sampled 
voxels in the crystal. 



IV. RECOVERING REAL SPACE LATTICE VIA 
LI MINIMIZATION 

An alternative technique for retrieving the positions of 
the real (and reciprocal) space lattice points associated 
with a crystal is to use the recently developed compres- 
sive sensing methodology 20 ^ and formulate the problem 
as an LI minimization problem. 

Let x be a vector representation of the 3D density map 
of a crystal lattice to be determined in real space. Similar 
to the 3D inverse Fourier transform approach, we will 
use the magnitude of each component of x to determine 
whether the 3D voxel associated with that component is 
a real space lattice point. 

The vector x is related to the diffraction measurement 
through the following equation: 

b i = eJ.J r x, for i = 1, 2, 2m, (4) 

where bi is the zth pixel of the binary image correspond- 
ing to a sampled 3D reciprocal space voxel that lies on 
the Ewald sphere, m is the total number of voxels on the 
Ewald sphere, T is the matrix representation of a 3D dis- 
crete Fourier transform, ji is the index of the voxel (in x) 
that lie on the Ewald sphere, and ej. is the j'^th column 
of the identity matrix. To ensure that x is real, Friedel's 
conjugate symmetry is imposed on b. Therefore, we have 
2m equations in Q even though the number of samples 
on the Ewald sphere is m. 

Because the number of sampled voxels on a Ewald 
sphere is always far fewer than the number of reciprocal 
lattice points, the linear system defined by Q is clearly 
underdetermined. Therefore, x cannot be recovered by 
simply solving Q. However, because the vector x to be 
recovered is expected to be "sparse", i.e., it is expected 
to have nonzero values at a subset of real space voxels, it 
follow s from the recently developed compressive sensing 
theor^i] 2 ^, ^Yi&t W e may be able to recover x by solving 
the following convex minimization problem 

min * Nli (*\ 

such that MTx = 6, y J 

where M is an 2m x n sparse "sensing" matrix that con- 
tains ej., i = 1,2, ...,2m, as its rows, b is a vector rep- 
resentation of the intensity values (0's and l's) assigned 
to voxels on the Ewald sphere (and its Friedel symmetric 
counterpart), || • ||i denotes the L\ norm of a vector. 

The equality constraint in ([5| can be relaxed to an 
inequality constraint of the form 

\\MTx-b\\ 2 <a, 

where || • ||2 denotes the L2-norm of a vector, for some 
small constant a to allow imprecise measurements or 
noise in the data. The relaxed minimization problem 
is often known as the basis pursuit denoising (BPDN) 
problem, and the original LI minimization problem ([5| 
is also known as the basis pursuit (BP) problem. 
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(a) Volume rendering of the reciprocal space lattice 
produced from the 3D Fourier transform of d). 




(c) "Observed" diffraction pattern. 



(b) Surface representation of the Ewald sphere with the 
reflections that fall on it and marked as red dots. 




(d) Volume rendering of the real space volume showing 
the rotated crystal lattice. 
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(e) Reconstructed real space volume using the 3D inverse 
Fourier transform of the observed data. 
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(f)Reconstructed real space volume using the LI 
minimization method for 300 iterations. 



FIG. 2: Test problem: 64 3 volume populated with a cubic lattice of period 8x8x8. 
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V. ALGORITHMS FOR SOLVING THE LI 
MINIMIZATION PROBLEM 

The LI minimization problem ([5| and its BPDN re- 
laxed form can be solved in a number of ways. In the 
software package SPGLl^, which we use to perform the 
numerical experiments shown in the next section, the 
BPDN problem is reduced to a sequence of what is known 
as the LASSO^ 1 problems 



such that 



\MFx-b\\ 2 
\Mi<r, 



(6) 



where r is a parameter that is determined in an iterative 
process that involves finding the root of nonlinear equa- 
tion (j){r) = <t, where <f>(r), which is the optimal value 
of the objective function in ([6| for a given r, is known 
as the Pareto curve. The LASSO problem is solved by a 
spectral projected gradient met ho cP^^ in SPGL1. 

An alternative approach for solving the BPDN prob- 
lem is to apply a first-order method developed by Y. 
Nestero\ J 28 * 29 ' to solve ^ directly. A software package 
based on this approach is called NESTAPP. 

The computational cost of both SPGL1 and NESTA is 
dominated by the calculation of Tx and J 7 *^, i.e., 3D fast 
Fourier and inverse Fourier transforms required in each 
iteration. Therefore, the overall cost of an autoindex- 
ing scheme based on LI minimization formulation of the 
problem is higher compared to the existing approaches. 
However, as we will see in the next section, the advantage 
of the method is that it can recover the real space lattice 
points reliably even when only a few Bragg peaks can be 
identified on a diffraction image. 



VI. COMPUTATIONAL EXPERIMENT 

To test the algorithm we created a 64 3 real space vol- 
ume which was then populated with a cubic lattice that 
contains 8x8x8 voxels. A rotation was then applied to 
the lattice (fig. |2(a)[ ) and the result was Fourier trans- 
formed to generate the 3D diffraction volume (fig. |2(d) ). 
The size of the problem was chosen for visualization pur- 
poses. 

The simulated diffraction data was calculated by se- 
lecting those voxels which are crossed by the Ewald 
sphere (fig. |2(b) ) and projecting them onto the detec- 
tor plane according to the geometry shown in Figure [I] 
Figure 2(c)| shows the simulated 2D diffraction pattern. 
The detector plane is uniformly sampled with 64 x 64 
pixels, and we set the distance between the crystal and 
the detector to 64 pixels. 

We then tried to recover the real space lattice using two 
methods. In the first method we simply took the inverse 
3D FFT of the diffraction volume in which the voxels 
that were not "observed" were set to zero. The in tensity 
of the transformed volume is shown in Figure 2(e) In the 
second approach, we solve the LI minimization problem 
([5| discussed in the previous section by using the SPGL1 
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FIG. 3: Normalized intensity values of the voxels recon- 
structed, sorted in descending order. The blue plot corre- 
sponds to the 3D Fourier transform method while the orange 
plot corresponds to the LI minimization method (after 300 
iterations). 



software provided by the authors oP^. The amplitude of 
the reconstructed real space lattice \x\ 2 after 300 itera- 
tions of SPGL1 is shown in Figure [2(f)] After only 10 
iterations it is already possible to see the unit cell posi- 
tions clearly enough to determine the crystal orientation 
and unit cell dimensions. 

It is clear from Figures 2(e) and |2(f)| that the latter 
approach results in a much sharper image from which 
the real space lattice points can be easily extracted. 

To quantify this difference we normalized recovered 
real space intensity values of both methods and sorted 
them in decreasing order. We plotted the sorted val- 
ues as ID curves in Figure [3j The curve that separates 
the orange and blue region of the plot is associated with 
the solution to the LI minimization problem. The curve 
that separates the blue region and the white area above 
it is associated with the sorted intensities obtained from 
a direct 3D inverse FFT. Clearly, the intensity associ- 
ated with the solution to the LI minimization problem 
decreases much more rapidly, thereby making it easy to 
select a threshold (shown as the magenta line in Figure [3| 
that can be used to identify real space lattice points. 



VII. CONCLUDING REMARKS 

We presented a new technique for autoindexing 
nanocrystal diffraction images. The technique is based 
on dividing the indexing problem in three steps. We re- 
formulate the critical step of recovering a real space 3D 
map of the lattice as an LI minimization (or BP) problem 
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and solving the problem by an efficient and robust numer- 
ical algorithm. A simple numerical thresholding reveals 
the positions of the 3D lattice points in real space. They 
can subsequently be used to determine the unit cell pa- 
rameters, crystal orientation and type as currently done 
in existing crystallographic software. Mirror symmetries 
of the lattice generate multiple solutions that still need 
to be sorted out in a final step using the measured inten- 
sities. 

Although the algorithm is more costly than the exist- 
ing approach because it is iterative and performs multi- 
ple 3D FFTs, it has the advantage of recovering crystal 
lattice reliably when only a few Bragg peaks can be mea- 
sured. Greedy algorithms that make use of the sparsity 
of the solution may avoid this problem and increase speed 
significantly 3 ^^. 

Once the lattice vectors and orientation are deter- 
mined, the lattice coordinates that overlap with the 
Ewald sphere will provide the index of a reflection. 

We demonstrate the feasibility of the technique with a 
simple example. More studies are needed to test the effi- 



cacy of the method on different types of Bravais lattices 
and on datasets that may be contaminated with noise. 
However, we believe that our preliminary results already 
indicate that compressive sensing based autoindexing is a 
promising tool for ultrafast nanocrystallography. More- 
over, this type of technique allows other constraints to be 
easily incorporated into LI minimization formulation to 
improve the reliability of indexing. It may even be pos- 
sible to extend this approach to index multiple crystals, 
powder diffraction or Laue data. 
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